# A large-scale study on research code quality and execution

<a href="https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi%3A10.7910%2FDVN%2FUZLXSZ">![](https://img.shields.io/badge/DOI-10.7910%2FDVN%2FUZLXSZ-orange)</a> [![arXiv](https://img.shields.io/badge/arXiv-2103.12793-b31b1b.svg?style=flat)](https://arxiv.org/abs/2103.12793) [![PyPi license](https://badgen.net/pypi/license/pip/)](https://pypi.com/project/pip/) [![Open Source Love svg1](https://badges.frapsoft.com/os/v1/open-source.svg?v=103)](https://github.com/ellerbrock/open-source-badges/) 

## Step 1. `get-dois` 

Code from `get-dois` enables communication with the Harvard Dataverse repository and collects DOIs of datasets that contain R code.

## Step 2. `aws-cli` 

The list of DOIs is used to define jobs for the AWS Batch. Code from `aws-cli` sends these jobs to the batch queue, where they will wait until resources become available for their execution.

## Step 3. `docker` 

When a job leaves the queue, it instantiates a pre-installed Docker image containing code to retrieve a replication package, executes R code, and collects data. Code from `docker` prepares the image.

## Step 4. `analysis` 

All collected data is retrieved and analyzed in `analysis`.

## Figure

<img src="https://i.imgur.com/DOBB1LI.jpeg" height="450" />
